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Abstract — An exact, one-to-one transform is presented that 
not only allows digital circular convolutions, but is free from 
multiplications and quantisation errors for transform lengths 
of arbitrary powers of two. The transform is analogous to 
the Discrete Fourier Transform, with the canonical harmonics 
replaced by a set of cyclic integers computed using only bit-shifts 
and additions modulo a prime number. The prime number may 
be selected to occupy contemporary word sizes or to be very large 
for cryptographic or data hiding applications. The transform is 
an extension of the Rader Transforms via Carmichael's Theorem. 
These properties allow for exact convolutions that are impervious 
to numerical overflow and to utilise Fast Fourier Transform 
algorithms. 

Index Tewns— DSP-FAST; Number Theoretic Transform; Dis- 
crete Fourier Transform; Fast Fourier Transform; Fermat Num- 
ber Transform. 



I. Introduction 

The Discrete Fourier Transform (DFT) is commonly used to 
compute the circular convolution h of two finite (or periodic) 
sequences / and g of length N as 



w-i 



h{j)^f{j)*9{j)= > /(fc).5(j-fc), 



fe=0 



(1) 



by using the Convolution Theorem, where Eq. (1) can be 
computed simply as a product of both sequences in Discrete 
Fourier space. This theorem provides a computational advant- 
age because the Cooley-Tukey algorithm [1] for computing 
the DFT has a computational complexity of 0(A^log2 A^), as 
opposed to 0{N'^) for direct methods, when A^ is a power of 
two. 

A major result of this letter regarding convolutions can 
be summarised as follows. Let {a)m denote computing the 
remainder with respect to m (see Appendix A for details), 
where a G Z, i.e. a is an integer, and m is a prime number (or 
prime) as given in, but not restricted to. Table I. To compute 
the digital circular convolution of two finite integer sequences, 
one transforms both sequences as 



JV-l 

E 

t=0 



X(u)= V(.T(t)-2-*), 



(2) 



which only involves bit-shifting, modulo and addition opera- 
tions. The coefficients of these two sequences are multiplied 
and the result is inverted as 



x{t) 



1 



N~l 



u=0 



I r 



(3) 



where 2""* = 2<-"*>« = 2^^"*. Note that the convolution is 
free from round-off errors as no floating-point numbers are 
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required. Exact digital filtering involving division operations 
can be performed via multiplicative inverses, i.e. an integer 

N-^ so that {N-N-'^)„-, = 1- The Cooley-Tukey algorithm [1] 
is easily applied by replacing e^^'"/^ with the powers of two 
2", where a G Z and i^ = — 1. In other words, the A^* root of 
unity e^'^*^ = 1 is replaced with the integer-only equivalent 
(2^)„i = 1. The transform lengths permitted are A^ = 2" 
when n G Z that divide Ambx in Table I. For example, the 
prime 13631489 allows for all A^ up to and including 2^^. 
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(b) 





(c) 

Figure 1. The unit circle (a) and the digital circles for computing exact 
convolutions (b)-(d). The paths in (b)-(d) show the integers produced from 
successive powers of 3 modulo 7 (resulting in {1, 3, 2, 6, 4, 5, 1, . . .}) in (b), 
successive powers of 2 modulo 7 (resulting in {1, 2, 4, 1, 2, 4, 1, . . .}) in (c) 
and successive powers of 3 modulo 16 (resulting in {1, 3, 9, 11, 1, . . .}) in (d). 
The transform lengths permissible for each are A^Max = 6 (and its divisors) 
for (b), AfMax = 3 for (c) and A^Max = 4 for (d). 
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Max. Transform 


Corresponding 


Word 




Length A^Max 


Fermat Number 


Size 


641 


64 


Fz 


16-bit 


2424833 


1024 


F9 


32-bit 


319489 


4096 


Fii 


32-bit 


13631489 


524288 


F18 


32-bit 



Table I 

Prime values of the modulus m for different transform 

LENGTHS N. Lengths permissible are 2" ^ A''max- Note that the 

SIZE OF THE PRIME IS NOT NECESSARILY PROPORTIONAL TO Nmax- 



Equations (2) and (3) are an extension of the Rader 
Transforms (RTs) [2], which, until now, were only practical 
for small (A^ ^ 128) transform lengths, severely limiting its 
applications [3]. The theory developed in Sec. U and III of this 
letter applies Carmichael's Theorem to remove these limitations 



completely by generalising the concept of an integer based 
N^ root of unity. Table I shows that the moduli m chosen 
for very large transform lengths A^ can fit into a 32-bit word 
size because the primes are shown to be any (including the 
smallest) one of the factors of large Fermat numbers, which 
are numbers of the form 



F„ 



-,2" 



1. 



(4) 



Sec. Ill also presents a modulus free transform similar to 
Eqs (2) and (3), i.e. an integer-only transform without the need 
for modulo operations, using the same theory. 

The preservation of the Circular Convolution Property (CCP), 
which allows one to use the Convolution Theorem for finite 
sequences, is made possible because the unit circle of the DFT 
(see Fig. 1(a)) is replaced with the digital "circle" 



2'"^ = 1 (mod™), 



(5) 



so that 2^ — 1 is a multiple of m (see Figs 1(c) and 2). The 
successive powers {1, . . . , A^} of two generates a unique set 
of integers in some order modulo m. The result is a circle 
consisting of a set of cyclic integers, i.e. a set of unique 
integers with a period. These integers define the "harmonics" 
of the transform in Eqs (2) and (3). Integer coefficients allow 
computations to be done without round-off error or numerical 
overflow, since the results are congruent modulo m [4]. 

The new transform is an extension of the RTs [2], which 
consist of the Fermat Number Transform (FNT) [5] and the 
Mersenne Number Transform (MNT) [6]. When using bit- 
shifts, the FNT and the MNT only utilise moduli 64-bits or 
less in size for small (N ^ 128) transform lengths. The FNT 
only allows larger transform lengths when not in bit-shift form, 
i.e. the sequence is multiplied by powers of integers other 
than two (such as three) in Eqs (2) and (3). A solution is 
to use multi-dimensional techniques which provide a limited 
extension of the transform lengths [3]. Pollard [7] showed that 
these transforms may have an alternate form (that also does 
not utilise bit-shifting) via Euler's Theorem 



a^ = a^(") = l (mod™), 



(6) 



where a, tti <E Z, a and m are coprime and (p{'m) is provided 
in Appendix B. When m is prime, Eq. (6) becomes a™^^ = 1 
(mod to), which is known as Fermat's (little) Theorem. These 
alternate transforms also preserve the CCP while allowing 
arbitrary transform lengths and are referred to as Number 
Theoretic Transforms (NTTs). The modulus to, for these 
transforms are primes of the form p — k ■ N + I so that 
m — 1 = k ■ N, allowing transform lengths of A^ = 2" and 
divisors of N [8]. On the rare occasions when computations 
explicitly require large exponents in Eqs (5) and (6), they can 
still be computed in logarithmic complexity using modular 
exponentiation methods. 

The NTTs are ideal for real data but can also be complex- 
valued [9]. NTTs have been applied to fast multiplication 
of very large integers [5], fast digital convolutions and filter- 
ing [2, 3], encryption [10] and discrete Radon transforms [11]. 
Agarwal and Burrus [2] showed the NTT to be faster than 
the Fast Fourier Transform (FFT) in their implementation. 



Chandra [11] (via the open-source library [12]) showed that a 
modern implementation of the NTT outperforms the popular 
FFTW library. 

II. Carmichael's Theorem 

This section presents a new and more general theory of 
Number Theoretic Transforms (NTTs) utilising the concept 
of primitive roots from Carmichael's Theorem [13] (see 
Appendix B), a generalisation of Euler's Theorem given in 
Eq. (6). The primary result of this new theory are Eqs (2) 
and (3) when using Table I. 

To construct an NTT, one needs a set of unique cyclic 
integers sufficient to represent all the coefficients of a given 
transform length N. In Euler's Theorem, the integer a is a 
special integer called a primitive root (or 0-root [13]) related 
to the modulus, where successive powers {l,...,m— l}of 
a generates all the integers {l,...,m — l}in some unique 
order modulo m (see Fig. 1(b)). This condition works well, 
but is very restrictive as not all integers are 0-roots of a given 
modulus and not all moduli have (/)-roots. For example, the 
integer 2 is only a (/)-root for primes of the form 4 ■ q + 1 
when q itself is prime [14, pg. 102]. Thus, the integer 2 is only 
suitable for prime length NTTs and not a 0-root of primes 
of the form fc • 2" + 1 required for power of two transform 
lengths in this theory. 

Carmichael [13] developed the concept of the primitive 
A-root, where the successive powers {1, . . . ,1^} of this root 
generates a fixed subset of the integers {l,...,m— l}in some 
unique order modulo to, (see Fig. 1(c)). The number of integers 
in this subset is i^, where v is the smallest integer for which 



a'^ — 1 (modjTi,), 



(7) 



is true. Thus, one gets a set of unique cyclic integers of order v 
capable of representing v distinct coefficients. Carmichael [13] 
points out the smallest composite (non-prime) modulus to, for 
when this and Eq. (5) is true is tti = 341. Since 341 = 11-31 
and 2^0 - 1 = 3 • 11 • 31, then z^ = 10 as 2^° = 2^^° = 1 
(mod 341). Such composite moduli are now known as Poulet 
numbers. To construct a unique and sufficient set of coefficients 
for power of two transform lengths, one needs to show that 
i^ = 2" and find a modulus m so that 2 is a A-root of m. 

III. New Transforms 

This section presents the proof of the transform stated in 
Eqs (2) and (3). It will be shown that the primes in Table I 
for this transform can be chosen to be any (including the 
smallest) prime factor of the Fermat numbers (4). The section 
will conclude with a discussion of another useful result. 

A. Multiplication Free 

In order to satisfy Eq. (5), a prime modulus must be selected 
so that 



TO I (22" -1), 



(8) 



i.e. 771 is a prime factor of 2^ — 1, when the transform length 
TV is a power of two. The modulus is chosen to be prime so 
that one may divide the coefficients by any integer, allowing 



the construction of arbitrary exact filters. The value 2^—1 
must be the smallest multiple of m to the base 2 so that 2 is 
a A-root of m as given by (7) and Sec. II. 

The numbers of the form 2^ — 1, which will be denoted 
as the Rader numbers, are a specific form of the Mersenne 
numbers 

M„ = 2" - 1. (9) 

Mersenne numbers can always be expressed as 

2°'' - 1 = (2" - 1) f 1 + 2" + 2^" + . . . + 2'^''-^^") , (10) 

when n is composite, since they are binomial numbers [15, pg. 
42]. Applying this expansion to the Rader numbers 



n — 1 
n = 2 
n = 3 



1 



2'* - 1 = (2^ - 1) 
2^ - 1 = (2^ - 1) 

= (2' - 1) 
n = 4 : 2^6 - 1 = (2^ - 1) 

= (2' - 1) 
= (2' - 1) 



(1 + 22) 

(1 + 22 + 2^ + 28) 

(1 + 22).(1 

(1 + 2^ + 2'^ 

(1 + 2^ + 2"^ + 2*) • (1 + 2*) 

(1 + 2^). (1 + 24). (1 + 28), 



2^) 
-2^ 



+ 214) 



and so on until one arrives at an identity of the Fermat 
numbers [16, pg. 26] 

2^" - 1 - Fo • Fi • i^2 • . . . • K-i = K - 2. (11) 

For example, the first several factorisations of 2" — 1 are 



2^ - 1 == 3, 
2^ - 1 = 3 • 5, 
2^ - 1 = 3^ • 7, 
2^ - 1 = 3 • 5 • 17, 
2i°-l = 3- 11 -31, 



2^ - 1 = 7, 
2^-1 = 31, 
2^ - 1 = 127, 
2^ - 1 = 7 • 73, 
2" - 1 = 23 • 89. 



Eq. (11) suggests that m should be a Fermat number, noting 
that only the first five Fermat numbers are known to be prime. 

Proposition 1 (Fermat Number Moduli). The smallest power 
of two V in Eq. (!) for which 



2^" = 1 (modm), 



(12) 



is true, is when the modulus m is the Fermat number Fn-i- 

Proof: By the identity (11), higher order Fermat numbers 
can only become a factor of a given Rader number as n 
increases. Multiples of these higher order Fermat numbers 
cannot exist as factors of Mersenne numbers 2^ — 1 less than 
the given Rader number since the only divisor of 2" is two 
and powers of two by Eq. (10), i.e. £ cannot be any number 
other than the divisors of 2". Hence by Eq. (11), 2^ — 1 is 
the smallest multiple of F„_i to the base 2 so A^ = 2". ■ 
This results in a reformulation of the FNT, which is only 
suitable for small transform lengths as the Fermat numbers 
grow large rapidly and are composite after F4. Can one extend 
the above theorem to include the prime factors of large Fermat 
numbers? The answer is yes and it is the main theoretical 
result of this letter 



Proposition 2 (Fermat Factor Moduli). The smallest power of 
two v for which Eq. (12) is true, is when the modulus m is a 
factor r (prime or otherwise) of the Fermat number F„_i. 

Proof: Assume the contrary, that there exists a Mersenne 
number 2^ — 1 less than a given Rader number that is also a 
multiple of r. From Eq. (10), I \ 2", but the only divisors of 
2" is two or its powers. Now it is well known that the Fermat 
numbers do not share any common factor with each other, i.e. 
they are pair-wise coprime [15, pg. 63]. Hence, t cannot be a 
power of two less than 2" and so the first multiple of 2^ — 1 
must be the Rader number Thus, for F5 == 641 • 6700417, 641 
or 6700417 allows for N — 2" when n ^ 6, since neither 
prime divides 2^ — 1 for all ^ < 2®. ■ 

We denote r as a Rader prime when r is prime. Some useful 
Rader primes are given in Table I, resulting in the transforms 
given in Eqs (2) and (3). This is a far more useful result than 
Prop. 1 as now the moduli may be small or as large as desired, 
by simply selecting a Fermat number with a suitable small 
or large Rader prime. Although the factorisation of Fermat 
numbers is still an active area of research [17], there are 
sufficient numbers of factors already known to accommodate 
any word size or for data hiding via large moduli. 

Applying the Generalised Fermat and Mersenne Numbers 
to Prop. 2 should extend the NTT of Dimitrov et al. [18]. 
Euler [19] showed that all prime factors of the Fermat numbers 
are of the form k ■ 2"+^ ^ 2, so Prop. 2 should also extend 
the work of Bhattacharya and Astola [8]. Finally, hardware 
implementations of this new transform may be similar to those 
constructed by McClellan [20] and Leibowitz [21] for the FNT, 
since both require simple bit-shifting and prime factors of the 
Fermat numbers are of the form k ■ 2"+^ j^\ [19] ^gg Agarwal 
and Burrus [2, Sec.VI.E] for examples of how to compute 
the bit-shifting. The next section introduces an integer-only 
transform not requiring modulus operations. 

B. Modulus Free 

Carmichael [13] also proves the useful result 

a^""' = 1 (mod2"), (13) 

where a and 2 are coprime (see Appendix B). This result can be 
used in constructing transforms that preserve the CCP, while 
requiring no modulo operations in programming languages 
(such as C) or architectures that support "wrap around" upon 
overflow, i.e. the act of truncation is equivalent to modulo power 
of two (see Fig. 1(d)). This is advantageous because the integer 
division instruction, a critical part of the modulo operation, 
is generally a slow instruction. Note that an expression like 
Eq. (13) is not possible using Euler's Theorem [13]. 

Normalisation of the transform is also a concern, since the 
multiplicative inverse does not exist. This can be resolved by 
ensuring 2" is sufficiently large so that unnormalised values 
do not exceed 2". In other words, if 2^^ is the bit depth of the 
data, then a^ 13 + N. 

The implementations of these transforms can be found 
in the NTTW C library [12]. Applications and performance 
comparisons of the various NTTs, as well as to the DFT, will 
be part of a future publication. 



Conclusion 

Transforms for fast digital convolutions were constructed that 
did not either require any multiplications or modulo operations 
(see Eqs (2), (3), Table I and (13)). The former utilises only 
bit-shifts, additions and modulo operations on prime factors 
of the Fermat numbers (denoted as Rader primes), while the 
latter only uses multiplications and additions. The result was 
made possible by using Carmichael's generalisation of Euler's 
Theorem, which also provides a more general theory of Number 
Theoretic Transforms. 
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Appendix A 
Congruences and Sino Notation 

Sine notation for computing the remainder or modulo 
operation is given as {a)„i, which denotes a (modm), i.e. 
a = {a)m+m-q, with a, {a)„n g, tti G Z so that ^ {a)m < rn. 
The modulo operation allows one to define a congruence, an 
example of which is given in Fig. 2. Multiplicative inverses h~^, 
i.e. the equivalent integers within congruences to do division 
by h (thus turning division into a multiplication), are found 
using the efficient Extended Euclidean algorithm, provided 
they are coprime or their greatest common divisor is one, i.e. 
gcd(6, m) — 1. 



M M 



M 



a 



a = b (mod M) 



Figure 2. Integer a is congruent to integer b when the distance between them 
is a multiple of M. 



Appendix B 
ToTiENT & Lambda Functions 

The Totient function (p{m) is the number of integers less 
than m that do not have a common factor (or are coprime) with 
TO. For example, when m is prime, the function (j>{m) — m~l. 
The Lambda function A (to) is defined in terms of the Totient 
function as follows 



A(to) 



so that A(2"pf 




if TO = p" and p is an odd prime 
if TO = 2" and < a < 2 
if TO = 2" and a > 2 

■ Pj') is the lowest common multiple of 
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A(2"), A(pr ), • ■ • , KpT) for m - 2>r 
one to define Carmichael's Theorem [13] 



■Pn 



This allows 



A(m) 



1 (mod to). 



(14) 



